A note on the border of an exponential family 
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Abstract Limits of densities belonging to an exponential family appear in many 
applications, e.g. Gibbs models in Statistical Physics, relaxed combinatorial opti- 
mization, coding theory, critical likelihood computations, Bayes priors with singu- 
lar support, random generation of factorial designs. We discuss the problem from 
the methodological point of view in the case of a finite state space. We prove two 
characterizations of the limit distributions, both based on a suitable description of 
the marginal polytope (convex hull of canonical statistics' values). First, the set of 
limit densities is equal to the set of conditional densities given a face of the marginal 
polytope. Second, in the lattice case there exists a parametric presentation, in mono- 
mial form, of the closure of the statistical model. 

Key words: Algebraic Statistics, Convex Support, Extended Exponential Family, 
Statistical Modeling. 



1 Background 

We consider the exponential family defined by the family of densities 

0) = exp BjTjix) - v/(0)^ , B e E"', (1) 

on a finite state space [2^ ,pL) with n ~ points and reference measure /i. Many 
monographs have been devoted to the study of this important class of statistical 
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models, e.g. lfni2l fTTIl . In this section we have collected facts from this theory and 
its algebraic version in order to introduce to our result discussed in Section|2l 

Different exponential families could represent the same statistical model. Con- 
sider the orthogonal decomposition Span (1, Ti, . . . , T,,,) ~ 1(BV C L^{J%',lJ.). In 
fact, V d Lq{^ ,IJ.). For each density p in the exponential family ([Til there exists a 
unique v eV such that p{x) = e*'"'^''**'), see QUO. 

The canonical statistics 

T = {Ti,...,Tm)- = cR'" 

map the statistical model ([T]i to the canonical exponential family 

/7(y;0)-exp|^f;0,3',-V/(0)^, 0eM'«, (2) 

where the new state space is {iV, v), with v = jU o r^' . In Equation (|2]i, the canonical 
statistics are coordinate projections y i-^ yj, j = I, . . . ,m. 



1.1 Monomial and implicit presentations 

Other useful parameterization of the exponential family ([T]i are available, in par- 
ticular the mean parameterization which shall be discussed in Section 11.31 In this 
paper we focus on a less known parameterization, i.e. the monomial parameteriza- 
tion, which is obtained from ^ by introducing the exponentials tj = e^J of each 
canonical parameter 6j, j ~ l,...,m, 

This presentation is especially useful in the lattice case, i.e. when the canonical 
statistics are integer valued. This is the case which has been studied with the meth- 
ods of Algebraic Statistics, see e.g. llT3l Sec. 6.9], Q- 

While Equations ([T]i and Q are equivalent for positive densities, an interesting 
phenomenon appears if the conditions tj > are relaxed to tj > 0. In such a case, 
(O makes sense and an extension of the original model is obtained, see lfT5lfT6l . For 
example, assume we let just one of the f/s, say fi, to be zero. It follows that the cor- 
responding unnormalized density is zero if Ti (x) ^ and is positive for Ti (x) = 0, 
giving rise to densities with support {Ti =0} which form a new exponential family. 
Thus, the exponential family ([T]i is extended to include exponential families with 
defective support. Unfortunately, such extension depends on the canonical statistics 
used to describe the statistical model as an exponential family. For example, if the 
chosen canonical statistics are never zero, no such extension is possible. 
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Statistical models of type ([T]l admit an implicit representation, see lfT3lfT2ll . Let 
1 ©y = Span (7b = 1, Ti , . . . , 7),,) be the linear space generated by the canonical 
statistics together with the constant 1, and let wi,...,w/ be a linear basis of the 
orthogonal space (1 ©V)-'-, i.e., l,ri,. . . ,rm,wi,. . . ,w; is a linear basis ofL^( jr,ju) 
and 

Wi{x)Tj{x)p.{x) =0, /=!,...,/, j = Q,...,m. 

If we introduce the (m+ 1) x n matrix A = [r,(x)/i(x)], j = 0, . . . ,ot, x G Tq = 1, 
then Span (wi , . . . , w/) = kerA. The case where A is integer valued is discussed in 
191. The general case is discussed in ifTTl . 

Since logp{-; 9) is an affine function of the canonical statistics T/s, a density p 
belongs to the exponential model ([TJ if and only if p is a positive density of /i) 
and 

w{x)ji {x) log p{x)^0, w G Span(wi,...,w/). (4) 

More precisely, if p = p{-',9) in ([T]i for a 9, then (|4]i holds true; vice versa, if 
£;fg_gf w(x)/j(jc)log/7(x) = holds true for w = w,-, ; = 1,...,/, then p = p{-;9) 
for some 9 . 

Equation ^ is equivalent to the following equation 

n/^W"'^"* = l, vv/AieSpan(wi,...,w,), (5) 

or, clearing the denominators, 

n p(xr^W= n /'W^'^ w/MeSpan(wi,...,w,), (6) 

x: w(x)>Q x: vv(.v)<0 

where w = w+ — and w+, > 0. Equation (|6]l makes sense outside the expo- 
nential model, i.e. if we assume p{x) > 0. Assume ^ = Supp p is strictly contained 
in ^ and satisfies Equation (|5]l. Therefore, p belongs to the exponential model as- 
sociated to the space Vq, with 1 © Vq = Span (wi| . . . , w/|_^;q)^ C ( J^,jLi|_^;jj). 



1.2 Toric statistical models 

From now on we assume that the m x ^ matrix A = [7}(x)]y=i d-^xeSt:, is non- 
negative integer valued. The nonnegativity assumption does not restrict the class of 
model we consider We define 

if* (A) = |y e Z'^' : y 0,Ay = o} 

be the lattice of A. We denote by A(x), x G ^ , the columns of A. The model (|3]l is 
written p{x;t) = f^*"' = ff ' • • •f'*"'W and it is called A-model. 
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Consider the homomorphism T from the polynomial ring Q[q{x) : x G ^] into 
the polynomial ring Q [f ; : j = l,...,d] defined by 

The kernel of T is a polynomial ideal Ideal (A), which is called the toric ideal of 
A. It is proved in 1211 that Ideal (A) is generated as a vector space by the binomials 

and it is generated as an ideal by a finite subset of such binomials, i.e. the binomials 
where k is an element of the Graver basis of ^* (A). Note that the binomials are 
homogeneous if, and only if, 1 G Span (A). 

Assume now that fi , . . . , f^/ take nonnegative and not all zero real values and con- 
sider the parameterization 

q{x)^t^^^\ xGJT, fe2 = R^\{0}. 

Note that t^^-^^ ~ Yij. A;(v)/ofj Each q{x) is nonnegative and strily positive if 
ti,... ,td > 0. Let / be a subset of indices, / C {I,... ,d} such that tj = for all j G /. 
Then q{x) ^ for all x G ,T such that Aj{x) ^OJ e I. 

There exists at least one x G ^ where q{x) j^Q if, and only if, each column of 
A contains at least one zero. In such a case, we have defined a parameterization of 
unnormalized probabilities q with parameters in the vertex-less quadrant: 



Q3t t-^ p{x;t) 



(Aix) 



Let us study the confounding induced by such a parameterization on strictly pos- 
itive parameters. If 

^(x) fA(x) ^ 



then the unnormalized probabilities are proportional and 
]~[ — j = constant,x G . 

or 

d 

^ (logi; — \o%tj)Aj(x) = constant. 
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If, and only if, 1 e Rank (A), there exists vectors c = (ci, . . . ,c^) such that A(x)c = 
constant and log sj — log tj ~ cj, j = l,...,d or sj = e' ' tj . Confounding is reduced 
to the confounding of uniform probability. 



1.3 Trace, closure, marginal polytope 

In the present section we discuss two general methods under which the reduction 
of the support appears, namely the trace operation and the limit operation. For each 
event C the trace on of the exponential family in ([T]) is the exponential 
family defined on by conditioning on y. 

We denote by ^> the convex set of strictly positive densities and by ./#> the 
convex set of densities. Both sets are endowed with the weak topology, i.e., if p„, n = 
1,2,..., and p are densities, then lim„^oo pn = p means lim„^oo pn [x) = p (x) for all 
X G ^ .In general, the exponential model ([U is not closed in the weak topology. The 
extended exponential family is the closure in the weak topology of an exponential 
family ([T]i. An extended exponential family according to this definition is a set of 
densities. A proper parameterization of the extended family requires the use of the 
expectation parameters and the identification of their range. 

Definition 1. The convex support, cf. e.g IHIJIEIj or marginal polytope, see 123)1 . 
and also IfTOl . of the exponential family ([TJ is the convex hull of = T{,'^), 



The previous set-up covers the behavior of the exponential family and its param- 
eterization with the expectation parameters in the interior of the marginal polytope, 
see 121 ■ The discussion of the parameterization of the extended family requires the 
notion of exposed subset. 

Definition 2. 1 . A face of the marginal polytope M is a subset F C M such that 
there exists an affine mapping A: W" 9fn.A(f) gK which is zero on F and 
strictly positive on M\F. 

2. A subset S G M~ is exposed for the exponential family (|2]i if 5 = T^^{F) and F 
is a face of the marginal polytope. 

The following theorem is a minor improvement of known results. 

Tlieorem 1. Let On, n = 1,2, . . ., be a sequence of parameters in Equation ([T]l such 
that for some q € ^> we have lim„_5.oo/?(x; 0„) ~ q(x), i.e., q belongs to the extended 
exponential model. 

1. If the support of q is full, > 0} = Jt^, then q belongs to the exponential family 
(HJ for some parameter value = lim„_>oo 0„. 
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2. If the support of q is defective, then the sequence d„ is not convergent, Supp^ = 
{q>Q} is an exposed subset of , and q belongs to the trace of the exponential 
family on the reduced support. 

Proof Let = {jc £ ^ : q{x) > 0}, = {x e ^ : q{x) = 0}. For each x G JTo, 
we have lim„^oo\og p{x;9„) = log^(jic) by continuity; for each x g ^i, we have 
Um„_i.c»log/7(jc; 9„) = From (|4]i we get 

^ log^(x)^(x);U(x)+jim ^ \ogp{x;0„)k{x)^i{x)=0, (7) 
with k € Span(wi , . . . , w/). 

1. If the set is empty, then q belongs to the exponential model because Equation 
(|7]) reduces to (01). The convergence lim„^ooT7„ = lim„^ooEg^^ [T] = Eg [T] = rj 
in M° implies the convergence of the 9 parameters ( mod the identifiability 
constraints). 

2. If the set ^\ is not empty, the second term of the LHS of (|7|i has to be finite, so 
that no linear combination of the w, 's can be definite in sign. Otherwise, the limit 
would diverge. In other words, the problem 

/ 

k : 9 n> ^ XiWi{x) > and k j^O for at least one x (8) 

i=l 

is not satisfiable. By the Theorem of the alternative, see e.g. lT9l Ch. 15], the non 
satisfiability of (fT3b is equivalent to the existence of a positive solution m*^'' (jc) > 
0, X € ^i,to the problem 

^ u''^\x)k{x)n{x) =0, ^ G Span(wi,. . . ,w;) . 
The random variable 

fo ifxG^o, 
|m(i)(x) ifxG^i, 

is orthogonal to all w,'s, so that there exist ao,ai,... ,a,n such that 

m 

M (x) = flo + 52 ^jTj {x) ■ (9) 

The conclusion on the support now follows from (|9]i- In fact, for each f G such 
that ' (f ) G .^"i the linear function oq + Y,j i^jlj is positive, while for each t such 
that T^^{t) G takes value zero, so that the points in are the points of an 
exposed set of the face of M identified by (|9]). 

Finally, on the support of q, \ogq is a linear combination of the T/s being a limit 
in the linear space generated by those functions. □ 
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Theorem 2. If q belongs to the trace of the exponential family ([T]l with respect to an 
exposed subset S, then q belongs to the extended exponential model. 

Proof. We generate sequences that admit as limit a generic density in the trace 
model by considering a one-dimensional (Gibbs) sub-model. Let F be the face of 
the marginal polytope such that S = T^\F) and let A be an affine function such that 
A{rj) = for 77 G F and A(t]) > for 77 e M \F. We can chose A such that AoT 
belongs to the space generated by 1 , Ti , . . . , T,,,, i.e. A o r = Oo + L7=i ^j^j- 
take Oo = if 1 £ Span (7} : j = 1, . . . ,m). 

Let be a value of the canonical parameter such that 



For J3 e M, 



q{x) = <^ Exesexp(E;!LieyrjW)MW 

ifx€^\S. 



M + I ejTj = ^ (iS aj + ej)Tj + liao, 



so that the one-dimensional statistical model 

pp = exp (^p (A - Oo) + £ OjTj - VA(j3a + 0)^ , /3 e R, 

is a sub-model of ([T]i. The family of densities 

^ = exp (/3(A - Oo) - (r(j3a + 0) - 

is a one-dimensional exponential family whose canonical statistics A — Oq reaches 
its minimum value — Oq on 5. Therefore, if j3„ — !■ n °o, its limit is the uniform 
distribution on S and, consequently, pp^^ is convergent to q. □ 



2 Extended families 

In this section we assume the exponential family ([T]) to be of lattice type, i.e. we 
assume that the m x n matrix A = [Tj{x)ii{x)], j = I,... ,m and x G <f?^, is non- 
negative integer valued. Hence, the exponential family can be written as in Equation 
([3]l and takes the monomial parametric form 

p{x;Qo. Yl 0>0, ; = (10) 

j-Aj{.x)>0 

In ||9l the statistical model ( fTOl ) is called the A-model, see also IS]. If all i^/s are 
positive, then (fTOl i is the exponential family with a different parameterization. If we 
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let one, or more, of the ^/s to be zero, either the monomials in ( fTOl i are zero for all 
X £ X , in which case no probability is defined, or the monomials are non-zero for 
some X, giving rise to a statistical model with restricted support, see the discussion 
in El. 

Each integer vector k such that Ak = 0, i.e. k G ker^ A, splits into its positive and 
negative part, k = k^ — k^, and we have 

n p(x,C)'''^"'= n P(^;Cr-'^ ^ekerzA. (11) 

x:k+{x)>0 x:k-{x)>0 

The statistical model defined by the infinite system of binomial equations (fTTl i is 
called the toric model of A, as defined in lfT3l . Again, if all the probabilities in ( fTTT i 
are positive, then the toric model is just the exponential family. If some probabilities 
are zero, then the toric model implies the A-model. In fact, substitution of (fTOl i into 
(fTTl i leads to an algebraic identity, without any restriction on the parameters ^j. 

The existence of a finite generating set for Equation (fTTT i is discussed in details in 
|[9]| , see also JSl. Moreover, in [91 it is proved that each probability in the extended 
exponential family satisfies ( fTTl i. We shall obtain a related result in a different way. 

Consider a second / x n matrix B with the same integer ker as A. The exponential 
model would be the same, but the border cases of the A-model could be different 
then the border cases of the B-model. The problem of finding a suitable maximal 
monomial model was considered first in lfT6l and it is fully discussed in ifTTl . Ra- 
pallo's method has been applied in fE\ to the Bayesian analysis of tables with struc- 
tural zeros. Here, we show that all of the extended exponential family is actually 
parameterized by this maximal monomial model. For a related approach see also 

ma. 

The maximahty of the monomial model is defined as follows. Consider the model 
matrix A 6 Z'"^". If the constant vector 1 does not belong to the row space switch 
to the matrix [lA] which defines the same exponential model. Let the column span 
of the orthogonal matrix K = [wi ■ ■ - wi] G Z"^' be kerQ A. The integer matrix K can 
be computed by a symbolic algebra software, such as ||5]|2ll. Numeric software 
might be unsuitable because it will normally produce floating point unit vectors, not 
integer vectors. 

Consider all possible rows of a non-negative matrix equivalent to A, i.e. produc- 
ing the same statistical model when all the parameters are strictly positive: 

.^^{be SpanQ(A) ■.b^O,be Z'\] = {beZ'l:b^ 0,b^K = 0} . 

The set is closed for the sum of vectors. It is proved in ll20l that a unique 
inclusion-minimal generating set, called Hilbert basis, exists. The Hilbert basis can 
be computed by symbolic software ||5]|22l. It is a (Q-generating set but it is usually 
much larger than a lattice basis. 

The following theorem was stated first il [[161 without a complete proof, see also 
the discussion in ifTTl . 
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Theorem 3. Let us consider the set SS of non-negative and non-zero integer vectors 
that are orthogonal to ker^A and let bi, . . . ,bi be its unique Hilbert basis. Define 
a I X n matrix B whose rows are the elements of the Hilbert basis. Hence, the ex- 
tended exponential family is fully parametrized by the B-model with non-negative 
parameters i.e. each one of the maximal exposed subsets of the A-model is obtained 
by letting one of the l^j 's to be zero. 

Proof. The constant vector belongs to therefore l,bi, . . . ,bi is a Q-vector gener- 
ating set, possibly non-minimal. In fact, any rational basis of SpanQ(A) becames an 
integer basis by multiplication by a suitable integer; the integer basis is transformed 
to an integer positive basis by adding, where needed, a constant integer vector; each 
of the vector obtained in such a way belong to 

The sets Sj~{xG^: bj{x) = O}, y = 1 , . . . , Z are non empty. In fact, if mj = 
mmxbj{x) > 0, as bj{x) ^ for some x, the vector bj —mjl belongs to and 
therefore can be represented as 

I 

bj{x) — nij = ^ nibi{x), x G 
i=l 

If «/ = 0, the basis is not minimal. If nj > 1, subtracting bj{x) from both sides, we 
get, by inspection of the signs of the two sides, that mj = 0. 

Each of the S/s is an exposed set of the exponential family. In fact, each element 
of the Hilbert basis belongs to the row Q — Span of the original matrix A, so that 

m 

bj{x) ^ l5oj + Y,Pijai{x), j= l,...,l, 

i=l 

where a, is the i-th row of A. The definition of exposed set is easily checked. 
Vice-versa, let ^ be an exposed set, i.e. 

m 

b{x)=l5Q + Y,Piaiix), 

i=l 

with = {x: b{x) ~ 0} and b{x) > for each x ^ S'. As A has integer entries, the 
coefficients )3o, )3i , . . . , j3/ can be chosen to have integer values, therefore b G S§ and 
it is a sum of elements of the Hilbert basis, 

/ 

b(x) = Y^ajbj{x), aj€Z+, ;• = !,...,/. 

;=i 

Therefore, S = Dj-, aj^oSj- □ 

Remark 1. The additive representation of b for maximal exposed sets contains only 
one term. However, the Hilbert basis might contain an element bj such that its zero 
set Sj is the intersection of other 5/s. In such a case, such a bj could be dropped 
from the B-model without loosing any part of the extended family. 
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3.1 4-cycle 

The 4-cycle is the exponential family 

exp + QcC + QbB + QaA + QbaBA + QcbCB + QdcDC + e^DAD - \\f{e)) 

where A,S,C,D are the coordinates of ^ = {±1}"^. The model matrix A and the B 
matrix are shown in the following edited R output: 



+++- 
++-+ 
++ — 
+-++ 
+-+- 
+--+ 
+ — 
-+++ 
-++- 



D C B A BA CB DC DA 



hi h2 /'3 b4 bj fcs ''9 'no 'ni '^12 ''13 ^14 ^'15 ''le ''18 ^^19 ^'21 ^"22 ^23 ^lA 




1 10 
1 10 1 
010000000 



1 110 1 
1 10 10 
100000000 











1 







1 











1 1 








110 



10 



10 
















In this exemple all the bj vectors are binary vectors, which implies they are all 
indispensable. The vectors Fj ^1 — bj are the indicator functions of the Sj sets. The 
polynomial representation is (after multiplication by 1/16): 

Fi Ft ^3 ^4 ^5 ■'^6 ^7 '^S ^9 ^IQ ^12 ^14 '^IS ^19 ^20 ^21 ^22 ^23 ^24 



12 12 12 12 12 12 12 
-4 -4 -4 4 4 
4 -4 -4 000400 
0-4 0-4-4 
0-4-4-4 4 
00000 -4 040 
00 -4 000440 
4 -4 0000040 
0-4 4 0-4-4 



12 12 

-4 







4 




4 
-4 



e.g. 



1 



1 



1 



Fx = ---D+-C+-DC 



4 4 4 4" 
that is DC^D ConSi. 

The Grobner basis of each ideal (A^ — 1 , Z?^ — 1 , — 1 , D" — 1 , Fy — 1 ) reveals in 
a different way the aliasing induced on each facet. 
Next: polynomial representation of the model. 



3.2 Markov chain 



Let X,, t = 0, 1, . . . ,« be a Markov chain with stationary transitions on the binary 
state space {0, 1 }. Let us denote by = P (^o = x), x = 0, 1, the initial probability 
and with t„. = P{Xi ^y\Xo =x), x,y = 0, 1, the transition probabilities. For each 
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trajectory (O G J^' = {0, 1}"^' the probability of the ti-ajectory is 

Ma)) = f('-^«('«»ff ri4"'^"\ (12) 

A,>=0 

where Nxy{co) is the number of transitions from x to y appearing in the trajectory (O. 
This in an instance of the toric model of the ^ x 6 matrix whose rows are 

[(l-Xo)XoA?ooA?oi Mo Ml] . 

Let us compute the confounding, i.e. find the vectors c G such that 

co{l-Xo{(o)) + ciXo{co) 

+ cooNoo{co) + coiNoi{co) + CiQNiQ{co)+cnNn{co) = a, (o e ^ . 

for some a. Note the following equahties; 

n «— 1 n 

Mo = E ( 1 - - 1 ) ( 1 - ^, ) = « - ^0 - 2 ^ X, - X„ + ^ X, _ 1 X„ 
f=l f=l r=l 

n u—\ u 

NQi = Y,{^-Xr^l)Xr=Y,X,+X„-Y,Xr-iXr, 

t=l t=l r=l 

n n—ln 

Nw = -^t) =Xo+Y.X,- Y,X,-iX,, 

t=i t=i t=i 

Nn = i^X,_iX,. 



Expanding the equation for c and observing that the vectors 1, Xq, Y^"IiX,, X„, 
Y!!=iXt-iX, are linearly independent, we obtain, equating to zero the coefficient of 
each vector, that 

Co + «coo = a 
ci -co-coo + cio = 
-2coo + coi +C10 = 
-coo + coi =0 
coo-coi -cio + cii =0 

The solution of the previous system is 



co=ci, coo = coi = cio = cii. 
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It follows that an identifiable parameterization of the exponential model from 
(Ois 

;,(a))=4'-^°(»Vri^i>^'"\ to + h = l, i:r,v = 2, (13) 

.v,v=0 xy 

while the Markov chain model is the submodel 

/,(a))-f('-^"""»ff n fo + fi = l, foo+foi-l, rio + fii = l. (14) 

;r,.v=0 

The orthogonal space of the model matrix is generated by the vector k = 
(«,n,l,l,l,l) 
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